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Abstract 

In many epidemiological contexts, disease occurrences and their 
rates are naturally modelled by counting processes and their intensi- 
ties, allowing an analysis based on martingale methods. These meth- 
ods lend themselves to extensions of nested case-control sampling de- 
signs where general methods of control selection can be easily incor- 
porated. This same methodology allows for extensions of the Mantel- 
Haenszel estimator in two main directions. First, a variety of new 
sampling designs can be incorporated which can yield substantial ef- 
ficiency gains over simple random sampling. Second, the extension 
allows for the treatment of multiple level time dependent exposures. 

1 Introduction 

Mantel-Haenszel estimators (Mantel and Haenszel (1959)) have long been 
used in medical research to quantify one group's risk of disease relative to 
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another. An excellent review of the development of the Mantel-Haenszel 
estimator for analysis of epidemiologic case-control studies, as well as the 
prominent role it has played in epidemiologic research generally, is given in 
Breslow (1996). 

In this paper we consider Mantel-Haenszel estimators for nested case- 
control studies in which controls are sampled from risk sets determined by 
the cohort failure times (see e.g., Langholz and Goldstein (1996)). In re- 
cent work, Zhang, Fujii, and Yanagawa (2000) defined generalized Mantel- 
Haenszel estimators when controls are a simple random sample from the 
risk set and derived the properties of the estimator for right censored cohort 
data. Further Zhang (2000) developed estimators for a number of methods 
of sampling controls including sampling with and without replacement and 
geometric sampling and showed their consistency. We expand on the work of 
these authors by providing estimators for the entire class of control sampling 
methods considered by Borgan, Goldstein, and Langholz (1995) , defining 
a natural "least squares" extension of the dichotomous covariate Mantel- 
Haenszel estimator to a multi-level covariate, and providing estimators of 
baseline hazard when a Mantel-Haenszel estimator is used for estimation of 
the rate ratio. Further, we show the consistency and asymptotic normality of 
these estimators under very general conditions, provide a number examples 
including random sampling, matching, and counter-matching and use the 
asymptotic variance results to compare the variance of the Mantel-Haenszel 
estimator to that of the maximum partial likelihood estimator, or MPLE. 
Moreover, at the end of Section El we show that our extension of the clas- 
sical Mantel-Haenszel estimator in the dichotomous exposure situation has 
the same asymptotic variance at the null as the MPLE, for all such sampling 
schemes, in general. 

In a cohort TZ = {1, . . . ,n} of individuals followed over a time interval 
[0, r] with < r < oo, a natural model relating failure and a binary exposure 
Z is that the failure rate for individuals i & TZ with exposure covariate 
Zi = 1 (group 1) is increased by an unknown factor 0o ^ (0, oo) over the 
failure rate for those unexposed, with covariate Zi = (group 0). The 
Mantel-Haenszel estimator for event time data provides a consistent and 
asymptotically normal estimate of the factor 0o in the semi-parametric model 
where individuals share a common but unknown baseline hazard function 
Ao(t) and fail at rate Ao(t)0f (Robins et al. (1986)). Letting Tj be the 
collection of all failure times among the individuals in group j, nk{t) the 
number of individuals in group k at time t and n{t) the total number of 
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individuals at risk at time t, with 



the classical Mantel-Haenszel estimator is explicitly given by 

7 -RlO 



01 



(2) 



It is well known that in the full cohort setting, the Mantel-Haenszel es- 
timator (j21) performs as well as the partial likelihood estimator at the null 
00 = 1. One contribution of this work is to show that this property is main- 
tained when comparing these same two approaches under sampling, and pro- 
vides our first reason to study the Mantel-Haenszel estimator. Secondly, we 
see by ((2} that the classical Mantel-Haenszel estimator, computed from a co- 
hort consisting of exposed and unexposed individuals, can be given "in closed 
form" without requiring the solution of a non-linear estimating equation, 
which must be done numerically. Again, this property of the estimator still 
prevails when sampling. A third reason to study the Mantel-Haenszel esti- 
mator is its popularity, which continues despite its efficiency drawbacks away 
from the null. For instance, a medline search of papers in the years 2000-2005 
gives a total of 420 references where Mantel-Haenszel is cited in the abstract 
as the method applied. Since this methodology is quite popular, there is 
value in adapting it to sampling schemes like counter matching, which make 
the estimator much more efficient than its present version. Lastly, we cite 
Breslow (1996), himself quoting from page 156 of Kahn and Sempos (1989), 
"As Kahn and Sempos rightly remarked in their 1989 textbook Statistics in 
Epidemiology, when a method is as simple and free of assumptions as the 
Mantel-Haenszel procedure, it deserves a strong recommendation, and we do 
not hesitate to give it." 

Although our main focus is on the use the Mantel-Haenszel estimator 
with various sampling schemes, we also extend its scope of applicability. In 
particular, suppose that to each individual i & TZ there is assigned a time 
dependent covariate Zi{t) with values in {ao,ai, . . . ,0;?,}, and an indicator 
Yiit) that equals one when i is observed, and zero otherwise. Letting the 
failure rate \i{t) of individual i at time t equal 

\{t) = y;(t)Ao(t)0o'^*\ (3) 
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where Xo{t) is an unknown baseline hazard function, gives a model which ac- 
commodates multi-level exposure, censoring, and time dependent covariates. 
By incorporating a constant factor into Xo{t) if necessary, we may assume 
without loss of generality that = ao < ■ ■ ■ < At any time t, the 
collection of individuals at risk 

n{t) = {t : Yi{t) = 1} 

may be divided into the rj + 1 groups, 

7lk{t) — {i E TZ{t) : Zi(t) — ak} with sizes nk{t) — \TZk{t)\, k — 0, . . . ,rj. 

The individuals in 7lk{t) for A; 7^ are said to be exposed, and have an 
increased risk of (f)^'' over those in TZo{t). The classical model under which 
the Mantel- Haenszel estimator has been developed is the case rj — 1, ai = 1. 

In many practical situations, sampling schemes are necessary to accom- 
modate situations where the collection of data in the full cohort TZ is im- 
practical, expensive, or impossible. In general a cohort sampling scheme is 
given by specifying for alH e r C 7?. a collection of probabilities 7rt(r|i) for 
choosing the individuals in the set r C Tl{t) to serve as controls should i fail 
at time t; we may set nt{r\i) = when i ^ r or if i is not at risk at time 
t. The flexibility one can gain by the choice of design 7rt(r\i) is substantial, 
opening up the possibility of using sampling designs that can take advantage 
of the structure of the data, resulting in substantial increases in efficiency. 

Each design 7rt(r|i) has an associated probability distribution on the sub- 
sets of TZ defined by 

n,{r) = n{trY,n,{r\t), (4) 

which sums to one by virtue of 

EE^*(^I^) = E E Mr\^) = J2'^^{t)=n{t). (5) 

rCTZ ier i&TZ rCTZ,r5i ieTZ 

In addition, we can define the associated weights Wi{t,r), set to when i is 
not at risk, by 

TT^ (r i) 

Wi{t,r) = — — , so that 7rt(r|i) = Trt{r)wi{t,r). (6) 

We highlight a few sampling designs: 
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Design 1 The Full Cohort. When information on all subjects is available, 
we may take TTt{r\i) to be the indicator of the set of those at risk at time t; 

TTt{r\i) = l(r = 7^(t)) and so Wi{t, r) = l{i e 7^(t), r = 7^(t)). 

The classical Mantel- Haenszel estimator is recovered under this scheme when 
rj = 1 and the covariates are time fixed. More generally, in this design and 
others, we allow censoring and multi-level, time dependent exposures. 

When the collection of covariate data on the full cohort is impractical 
and no additional information on cohort members is available, the nested 
case-control design is a natural choice: 

Design 2 Nested Case-Control Sampling. At each failure time, a simple 
random sample of m — 1 individuals is chosen from those at risk to serve as 
controls for the failure; 

TTifrU) = (^^^^ l(r C 7^(t),r 9 i, Irl = m). 

\ m — 1 J 

The probabilities in and weights ^ for this design are given, respec- 
tively, by 

7rt{r) = (^^^A /(r c 7^(^), |r| = m) and Wi{t, r) = 

\ m J m 

forierd TZ{t). 

The next two designs we consider, matching and counter matching, de- 
pend on the availability of some additional information on all cohort mem- 
bers. In particular, we assume that for each i G 'R-{t) we have available the 
value Ci{t) giving the strata membership of i among the possible values in 
C, some (small) finite set. For / G C let 

Ci{t) = {i : Yiit) = l,a(t) = /} and Ci{t) = |C/(t)|, 

the sampling stratum, and its size, at time t. 

Design 3 Matching, with specification m = {mi)i^c> ^/ ^ 1- If subject i 
fails at time t, then a simple random sample of mci{t) — 1 controls are chosen 
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from Cc^{t){t), the failure's stratum at time t, to serve as controls for the 
failure. Hence, the sampling probabilities of this scheme are given by 

= h'^'^^^^ 'l(r C Cc.w(t),r 9 ^, |r| = m^.w). 

V ^c,(t) -I J 

The probabilities in and weights ^ for this design are given, respec- 
tively, by 

n{t) \ mi J 

and ^ 

Wi{t, r) = n{t) ^ — /(r C Ci{t), |r| = mi,i E r). 

l€C 

The matching design could be used to control for confounding by strati- 
fying by a potential confounder. In this design we can apply the estimator 
(P) with no change in the more general situation where there is a different 
baseline hazard in each strata. The consistency of (pn in this situation is 
preserved when the various conditions are satisfied in each separate strata. 
For details, and the asymptotic variance in this case, see the analysis of this 
design in Section |31 

Design 4 Counter Matching, with specification m = {mi)i^c> ^ ^- If 
subject i fails at time t, then nii controls are randomly sampled without 
replacement from each Ci{t) except for the failure's stratum, from which 
f^c\{t) — 1 controls are sampled. Let Vc{t) denote the set of all subsets of 
'R-{t) with mi individuals of type I for all I G C. Then for r G Vc{t) and i E r, 
the sampling probabilities of this scheme are given by 




The probabilities in ^ and weights ^ for this design are given, respec- 
tively, by 

r n -1 




/(r C 7^(^), |r n Ci{t)\ =mi]l e C) 



and 

Wi{t,r) = cc,(^t){t)/mc,{t)- 
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An important instance where the counter matching design can be apphed 
is where a surrogate exposure is available on all subjects. In Section El we 
show that significant efficiency gains over random sampling can be achieved 
when the surrogate exposure is sufficiently correlated with the true. Addi- 
tional sampling schemes for which our results can be applied can be found 
in Borgan, Goldstein, and Langholz (1995), in particular, counter matching 
with additionally randomly sampled controls, and quota sampling. 

To study sampling schemes and provide an extension of the Mantel- 
Haenszel estimator which functions in a generality that accommodates time 
varying and multi-level exposures, we set the model in the counting process 
framework. Let Ni^r{t) be the counting process that records the number of 
times in (0, t] that i fails and r is chosen as the sampled risk set. By summing 
the counting processes Ni^rit), we obtain 

^rW= Yl and N,{t)=Y,N^At), (7) 

recording, respectively, the number of times in (0, t] that r was chosen as the 
sampled risk set for a failure in TZk{t), and the total number of times in (0, t] 
that r was chosen as the sampled risk. Now let 

^rW= ^t(rK) = 7rt(r) ^ Wi{t,v), k = 0,...,7]. (8) 

For a given continuous function a : R''+^ [0, oo), define 

ar(t)=a(AO(t),...,A^(t)), 
and suppressing dependence on a, for j ^ k set 

Rjk{t)= [ Y''^^'^^ris)dN^{s), and R,k = Rjk{r). (9) 

It is convenient to choose an a for which 

|a(fo, . . . , v^)vk\ < 1 for all > 0, = 0, . . . , r/. (10) 
A natural choice which satisfies condition ()10|) is 

a{vo,...,Vr,) = {vo^ hw^)"\ (11) 
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extending the t] = 1 canonical choice of a{u,v) = {u + v) ^. By for this 
a and sets r with TTt{r) 7^ we have 



aAt) = (J2^t{r\^)] = (n(t)vr,(r))-\ (12) 

and hence, by Q, with tij < ■ ■ ■ the ordered collection of failure times 
for individuals with covariate j at the time of failure, and TZij the sampled 
risk set at failure time tij, 

^^' = Y.^) E Mt,ni,). (13) 

'>! ^e7^^J,Z,(^,,,)=fc 

For the full cohort information (Design P), since 

jZij = n{tij) and Wi{t,n{t)) = 1 for i G 7^(^), 

the expression for Rj^ in (fT^ reduces to that in (jT)). 

Noting that for rj = 1 the estimator ^ is the solution of the linear 
estimating equation 

(pRoi — -Rio = 0, 

it is therefore the unique minimizer of Gq^^ (0) where with j < k we let 

G,k{<p) = r'R,k-r'Rkj. (14) 

Hence, given non-negative constants Cjk not all zero, we propose as our esti- 
mator a value 0„ which minimizes the weighted sum of squares 

j<k 

that is, a solution to the estimating equation Un{(f)) = 0, where, with G'^j^i^cj)) 
denoting the derivative of Gjk{(p) with respect to 0, 

Wn(0) = n-i5^c,fcG,,(0)G;.,(0) (15) 
j<k 

= (^Ar'Rjk - r^Rkj){akr'~'Rjk - a,r^-'Rk,). 

j<k 

8 



We prove that the estimator 0„ is consistent for 0o under the conditions 
specified in Theorem 13.11 and estabhsh its asymptotic normal distribution 
in Theorem 13.21 Proposition 13.31 shows how to choose the constants cjk to 
achieve the minimum asymptotic variance over the class of all estimators of 
this form. For other possibilities regarding the construction of estimating 
equations which may have some efficiency advantages, see Qu et al. (2000), 
Godambe (1960), and Heyde (1997). 

Where estimates of 0o can be used to assess the magnitude of the effect 
that exposure has on failure, estimates of the integrated baseline hazard 

Ao(^) = / )^o{u)du 
Jo 

can in turn be used to provide estimates of absolute risk. We consider the 
integrated baseline hazard function estimate 

given in terms of the weights defined in where the ratio in the integral 
is regarded as if there is no one at risk. In Theorem 14.11 we give conditions 
under which 

x/n (^A„(-,0„) - A(-)) 

converges weakly as n — > oo to a mean zero Gaussian process, and provide a 
uniformly consistent estimator for its variance function. 

The counting process model and some of its consequences are derived 
in Section |21 The consistency and asymptotic normality of (pn and A„ are 
proved in SectionsElandlHrespectively. In SectionElwe study the asymptotic 
properties of these estimators under Designs ^ - IH and present efficiency 
comparisons against the partial likelihood estimator. Much of the analysis 
here follows the work of Borgan, Goldstein, and Langholz (1995) closely, and 
is hereafter referred to as BGL. 



2 The Counting Process Model for Sampling 

We will assume that the censoring and failure information are defined on 
a probability space with a standard filtration J-'t, and that the censoring 
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indicators Yi{t), exposures Zi{t), design '7rj(r|i) and strata variables Cj(t) are 
left continuous and adapted, and hence predictable and locally bounded. We 
make the assumption of independent sampling as in BGL, that the intensity 
processes with respect to the filtration J^t is the same as that with respect 
to this filtration augmented with the sampling information; in other words, 
we assume that selecting an individual as a control does not influence the 
likelihood of failure of the individual in the future. We assume that the 
intensity process of A^i,r(^) is given by 

\r(t)=0o^^*Vt(rK)Ao(t), (17) 

so that subtracting the integrated intensity from the counting processes 
Ni^r{t) results in the orthogonal local square integrable martingales 

Mi,r(t) = Ni^r{t) - [ XiAs)ds, (18) 
Jo 

with predictable quadratic variation 

d < Mi^y. >t= XiAt)dt. 

Further, we assume that the baseline hazard function \o(t) is bounded away 
from zero and infinity. 

With A'^{t) given in (jHJ, by linearity the counting processes A^r (^) 
Nr{t) defined in ((Tj) have respective intensities 

Ar'(t) = 0^M^(t)Ao(t) and Ar(t) = 0^^Ar^(t)Ao(t), (19) 

and give rise to the orthogonal local square integrable martingales 

M^{t) = J2 ^*>r(t) = N^t) - j \l{s)ds and 

Mr(t) = ^ Mi^t) = Nr{t) - [ Xr{s)ds, 

with predictable variations 

d < >t= \l{t)dt and d < >t= \rit)dt. 
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Using Q, (HHI), dHD and (HHD, we have 

Rjkit) = «r(s)^r(s) {<Po'Ai{s)Xo{s)ds + dMi{s)) 



rcn ran 
For V a multi-subset of {0, ... , 77}, e.g. v = {0, 0, 1}, let 

H.{t) = Y^a\:\-\t)\{Al{t). (21) 



rCH fcev 



In particular, for |v| = 2, and subscripting by jk rather than {j, k} for 
notational convenience, we have 



rcn 

Letting in addition 



Wjk{t)= fY,aMAl{s)dMi{s), (22) 
"^0 rcTe 

we may write as 

Rjkit) = fo I Hjk{s)\o{s)ds + Wjk{t). (23) 
^0 

The processes Wjk are local square integrable martingales, and by the or- 
thogonality of M^ (s) and (HH), have predictable quadratic covariation 



rcn 

= ^j=p)^'^'Hjkg{t)\o{t)dt, (24) 

so in particular 

d < Wjk >t= ^o'H,kkit)\o{t)dt. (25) 
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By and Hj^ = H^j we have, suppressing the dependence on which 
is exphcit in (jl4p . now considering Gjk a function of t, for j < k we have 

G,k{t) = (P^'Rjkit) - (po'RkAt) = <Po'W,k{t) - (po'Wkjit) (26) 

are local square integrable martingales with quadratic covariation, for j < k, 
p<q, by 



d < Gjk, Gpg >t= d < (po^Wjk - ^o'Wkj, (po'Wpg - (po^'Wgp >t 



in particular, 

d < G,k >t= (Po"^'"' {(po'HMt) + foHknit)) Ut)dt. 



3 Asymptotics of 



n 



We prove the consistency and asymptotic normality of 0„, a solution to 
^n(0) = with Un{(p) given by (fT3|) . under some regularity and stability 
conditions. 

Condition 1 The cumulative hazard on the interval [0, r] is finite: 

Ao(r) < oo. 
For Hv{t) given in (j2T| . define 

hnAt) = ^^v(t). (27) 



Condition 2 For hn,vit) with |v| G {2,3}, i/iere exzsi /e/i continuous func- 
tions hn^v{t), hv{t), hy,{t) such that for all t E [0,r], 



< /i„,v(t) < KAt), 
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and for almost all t in [0, t], 



hnAt) -^P Kit), and hn,v{t) -^p Kit), 

and 



K,vit)^oit)dt / hv{t)XQ{t)dt < oo. 
' Jo 

Note that for any a satisfying (fTIUl . so in particular for the canonical choice 
of a given by (fTTj) . we may take hn,yf{t) = K{t) = 1 since for the | v| — 1 terms 
in the product in (j?H) we have ar{t)A^{t) < 1, and applying the additional 
factor 1/n granted in (jTTj) . we have by (0) that A^{t)/n < 1, taking care of 
the remaining factor of A^{t) in the product. Hence, if Condition Q holds and 
a satisfies (ll(J|) then Condition |21 holds provided only that hn^t) —>-p h^{t) 
for |v| G {2,3}. 

The following version of the dominated convergence theorem is due to 
Hjort and Pollard (1993) : 

Proposition 3.1 Suppose Aq{t) < oo and let < f/„(t) < Un{t) be left- 
continuous random processes on the interval [0,r]. Suppose Un{t) U{t) 
and Unit) ^p U{t) for almost all t, as n ^ oo, and that JJ" tJnis)Xois)ds —*p 

Jq U{s)Xo{s)ds < oo. Then J^^ Un{s)Xo{s)ds -^p U{s)Xo{s)ds for all t e 
[0, r] as n ^ oo. 

For given v and and corresponding hv{t), let 

I.{t) = [ h^{s)Xois)ds, (28) 
Jo 

and for p a non-negative integer, let {a)p = a\/{a — p)\, the falling factorial. 

Proposition 3.2 Let Conditions^ and\^hold. Then for every t G [0,r], 

n'^ < Wjk, Wpq >t^p l(j=p)0o^/jfcg(t), (29) 

and 

n-'R,kit) -.p<plU^^{t). (30) 

Furthermore, as n oo, for all j < k and p = 0, 1, . . the p^^ derivatives 
of Gjki(f>) defined in satisfy 

^■'^5^(0) ^p ((a,),r'=-^0o^ - {a,)pr^-P<f>T)W- (31) 
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In particular, for p = 1 



n ^^^^(^o) I3jk = (ttfc - aj)0o'^"' ^Ijki.r), (32) 
and with Un{(p) as in / f73)) . 

n~^Ki(Po)^p7 where 1 = ^Cjk(i%. (33) 

Proof: Conditions n and 121 and Proposition 13.11 give 

/ /i„,v(s)Ao(s)rfs -^p / /iv(s)Ao(s)(is (34) 

for |v| G {2,3}. In particular, with |v| = 3, by we obtain ()29|) . 
By (j2Sl) and (j^Tj) we have 

1 a /"* 1 

-i?jfc(t) = 0o' / /injfc(s)Ao(s)(is+ -iy_,fc(i)- 
u Jq n 

By fl34|l the first term converges to the right hand side of ()29|1 . For the second 
term, by (f^ . (fTffl and Lenglart's inequahty (see Andersen et al. (1993)), 
for all positive e, 5, 

P fsup |-W^,fc(t)| > < 4 + i— I KMt)Ht)dt > S] . 
\t<T n J \ n Jq ) 

Now applying for v = {j. A;, /c}, we see 

— iy,,j!c(t) as CX3, 
n 

and hence (jnOJ; (jHH) now follows immediately from (fT^ and (plj). 
Taking derivatives in (fT3|) yields 

j<k 

and now follows using (jH^ for the first term, and for p = at 
= 00 to show the second term varnishes. ■ 
Our first result gives the consistency of (pn under the following additional 
non-triviality condition; in particular, note that Cjk can always be chosen to 
be positive for all j < k. 
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Condition 3 There exists some pair j < k for which both Cjk in il^) and 
/jfc(r) in are strictly positive. 

Theorem 3.1 Under Conditions^ through\^ the estimating equation M^] 
has a consistent sequence of solutions 

4>n -^p 4>Q as n ^ oo. 

Proof: By the arguments of Aitchison and Silvey (1958) and BiUingsley 
(1961), it suffices to show that as n — >• oo, 

n-^Un{h) 0, (35) 
n~^Ul^{(j)o) converges in probability to a positive number, 

and that there is a neighborhood 6o of 0o such that for every rj G (0,1), 
there is a such that for all n, 

p{-K{<p)\<K,<peeo) > l-v- (36) 

n 

Applying for p = 0, 1 gives the ffist part of (|^. the second part is 
the positivity of 7 following from Condition |21 
By ()31|) . each term in the second derivative 

j<k 

is uniformly bounded in probability in any bounded neighborhood 0o of 0o 
not containing zero, giving the uniform boundedness in probability condition 

dnni)- ^ ^ ■ 

To obtain the limiting distribution of 0„ and A„(-) we assume the follow- 
ing 

Condition 4 There exists 6 > such that for all j < k, 

J2 \ar{t)A':{t)\^+'Ai{t)Xo{t)dt 0. 

rCTl 

Note that ConditionEJis satisfied for any 6 > using any function a satisfying 
()1()|1 . so in particular the canonical function a given in (fTT|) . since by (jS)), 

J2 Wrm';{t)r'Ai{t) <J2Ai(t) < $^$^7r,(rK) = n{t) < n. 

rcn rcn rC7^ iGr 



n 



l+S/2 
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Lemma 3.1 Under Conditions 1-4, the processes {n~^/'^Wjk{-)}j.k given in 
\2^) converge jointly m D[0, r] to the mean zero Gaussian processes {wjk{-)}j^k 
with covariation function 

< WjkjWpq >t= l{j=p)(f)Q' Ijkq{t), 

and hence the collection {n^^^'^Gjk}j<k given in \2b]) converges jointly in 
D[0, r] to the mean zero Gaussian processes {gjk{-)}j<k with covariation func- 
tion 

< gjk,gpg>t (37) 

= - l(fc=p)) Ijkq{t) + 00^+"'="*""'' (l(fc=9) - l0=g)) Ijkp{t), 

SO in particular, 

< gjk >t= 0o*'^°-' / {(po'^hjkkis) + (pQ'hkjjis)) Xo{s)ds. 
Jo 

Further, for any permutation {l,k,x) of {j,k,q), all t G [0,r], and any 
consistent sequence (pn 0O; as n ^ oo. 



n 



•n"'[W,^, W,^]t -^p< W,^,W,^ >t= Ijkqit), 



where 



1 /■* 

the scaled optional variation, so that 

Ijk,{t) = n^'J2^^^xk'"'[W^^,W,^]t^pIjk,{t), (39) 

where the sum is over all permutations {l, k, x) of {j, k, q), and ^ ^^^^ = 1. 

Proof: We apply the martingale central limit Theorem of ReboUedo, as 
presented in Theorem II. 5.1 of Andersen et al. (1993). The processes 
{n~^/'^Wjk}j^k are local square integrable martingales, whose predictable quadratic 
variation converges by Proposition 13.21 to the continuous functions given in 
(j2ni)- Using the Lindeberg condition, Condition |1[ 

- («r(t)Ar(i))'l(^~'^>r(t)Ar(i)l > t)X>,{t)dt 

\ar{t)Al{t)\^+'Ai,{t)\o{t)dt ^p 0. 



ran 



< 



rC7^ 
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The convergence of the scaled optional variation (|38|) to the limit (j29j) of 
the scaled predictable variation follows from Theorem II. 5.1 of Andersen et 
al. (1993). ■ 

With Si°\(j),t) as in ()52|) , in place of (jHHj) one may consider the estimated 
scaled predictable variation 

Since dNr{t) = S^^\(po, t)Xo{t) + dMr{t), replacing 0„ by 0o in (001) gives the 
scaled predictable variation plus a martingale term, and so (pUj) converges 
in probability to Ijkq{t) when the martingale term tends to zero and the 
replacement of 0„ by (f>o is asymptotically negligible. 

The variance estimators based on (jHHj) and (pn|) simplify considerably 
in special cases. The variance estimator of Zhang, Fujii, and Yanagawa 
(2000) for simple random sampling uses the estimated predictable variation 
fl4()|l . The empirical and conditional variance estimators of Breslow (1981), 
with one case per set, correspond to the optional and estimated predictable 
variation estimators for simple random sampling for the canonical a as in 

dm. 

Theorem 3.2 Under Conditions 1-4, for 0„ any consistent sequence of so- 
lutions of the estimating equation il^) . we have 

^ (0n - 0o) A/'(0, o") where a' = v'h'' (41) 

with 7 as in and 

= ^ ^ CjkPjk ^ 9jk^ 9pq PpqCpq, (42) 
j<k,p<q 

with jdjk as in and < Qjk^Qpq >t in P^ - By / f3'P)) of Proposition 
< gjkjdpq >T can be consistently estimated by < gjkjQpq >t, given by 

whereas by and U^) . [3jk can be consistently estimated by 

P,k = {ak - «,)e+°^-^/,fc(r) (44) 
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where 

{jM={i'M 

for any weights C,i,k summing to one. Hence 

= (46) 

consistently estimates o"^ where and 7 are obtained by substituting 
and into [4W , and into / Tfi^) respectively. 
In the parameterization 9 = log 0, 

^(^9-9o) ^^^iO,ay<Pl). (46) 

Proof: By the consistency of the solution 0„ for 0o, that n~^Un{(f)o) '~^p 
7 > 0, and the uniform boundedness of the second derivative of Un{4>) in a 
neighborhood of 0o given in ()36p . we have 

V^ik - M = -T'n'^'^U^{ct>o) + Op(l). (47) 

But 

by (jlSp . Lemma [3. 11 and ((221) of Proposition EIH now follows by a direct 
application of the delta method. ■ 
Let c be the vector of the constants Cjk obtained by taking the pairs j < k 
in some canonical order, say lexicographically; with the same indexing form a 
matrix F with entries < gjk, gpq >r and a matrix B with diagonal entries f3jk- 
Note that when F is positive definite the matrix BTB is also, and therefore 
there exists a non-singular matrix M such that 

BTB = M'M. (48) 

Proposition 13.31 provides the constants Cjk which minimize the asymptotic 
variance (j4ip of 0„. 

Proposition 3.3 Let F be positive definite, 1 the vector all of whose entries 
are 1, M as in ^B^ , and 

X = {M-^yB'^ 11' B'^M'\ 
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Then taking 

c = Af-M, 

where d is any eigenvector corresponding to the largest eigenvalue A of X , 
minimizes the asymptotic variance with the value cr^ = A""*^. 

Proof: In the given notation, we may write 

, , , . c'BVBc 



7 = 1 i? c so that V 



c'BHVB'^c' 
Then letting d = Mc we have by P8|l 



-2 



c'BVBc ~ c'M'Mc ~ d^d ~ d'd ' 



which has its maximum value of A, the largest eigenvalue of X, when d is a 
corresponding eigenvector. ■ 
For Tj = 1 and = 0, ai = 1, because the estimator 0„ is given explicitly, 
the consistency and asymptotic normality of can be shown in a more 
direct way, the framework, however, remains sufficiently general to include 
sampling. In particular, from (j2I) and Proposition \'A.2\ 



10 'i -fl-lO , 

00, as n — i> cx). 



and 



n {4>n - 00 j 



n ^Roi n ^Rqi 

from which it directly follows using Lemma f3. II that 

(0n - 0o) ^fiO, ' 

where 



a 



2 _ /o"^ (0oVi(i) + (pohiooit)) \o{t)dt 



in agreement with the conclusion of Theorem 13. 21 and formulas fl42p and ()33p 
in this special case. 
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Moreover, with the canonical choice a{u,v) = {u + v) ^, we have 



so that under the null (pQ = 1, simplifies to 

" {J,^hoiit)Xoit)dt)- ^^^^ 

Under the full cohort case, it has been long known that the Mantel- 
Haenszel estimator, using a{u,v) = {u + has the same asymptotic 

variance as the Maximum Partial Likelihood Estimator (MPLE) at the null. 
We close this section by noting that this result extends to sampling schemes 
in general, that is, that (j3T|) is the null asymptotic variance of the MPLE 
derived in BGL. 

In general, we let 

Si'\M = 5^0^^W7r,(rK) = X^0°'=A^(t) (52) 

ier k=Q 

Si'\<P,t) = 5^Z,(t)0^'W-V,(rK) = f^«,r^-'Ar^(t) 

i£r k=l 

and EAM = (53) 

and recall ao = 0; we apply the convention that 0/0 = 0. 
In the classical case rj = 1, under the null (pQ = 1, 

S(o)(l,t) = ^7rt(r|z) = A°(t) + /lJ(t), and Si^\l,t) = Al{t). 

Referring now to (3.4) of BGL (where /3 = there corresponds to = 1 
here), since = Z when Z G {0, 1}, we have 
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The integrand against the basehne hazard function in (3.10) of BGL, which 
yields the inverse variance of the MPLE, simphfies in this case to 



rcn 



rcn 











Alit) 










Alit) 



[A'rit) + Alit)] 



, AO,{t) + Al{t) + 

Aim^At) 



rcn 



A'At) + Ai{ty 



in agreement with (jSOl), showing the variances of the MPLE in BGL and of 
the Mantel-Haenszel estimator, at the null, are equal, for sampling in general. 



4 Baseline Hazard Estimation 

To study the baseline hazard estimate ()16p . we recall definitions ()52|) and 
(j^ . and impose the following additional conditions. 

Condition 5 The ratio n{t)/n is uniformly bounded away from zero in prob- 
ability as n ^ oo. 

Condition 6 There exist functions e and ip such that for all t G [0, r] as 

n oo, 

J2Mr)EMo,t)^pe{<f)o,t), (54) 
rcn 

and 

nJ27r,{rY{Si'\M}-' ^,^{<Po,t). (55) 
rcn 

Letting ti < ^2 < ■ ■ ■ be the collection of all failure times, and TZj the 
sampled risk set at failure time tj, we rewrite the cumulative baseline hazard 
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estimate (fTB|) as 

A„(t, (pn) = ^ — 

tj<t zZieHj 4>n Wi(tj,7lj) 
where the weights Wi(t,r) are given in (jH)). 

Theorem 4.1 Let Conditions 1-6 hold, and with e(0O)^) in |371 ) sei 

B{t,(j)o)= / e(0o,^i)Ao(M)citi. 



2^\-0o)i?(-,0o) 



T/ien 'n}^'^{(j)n — 0o) '^'^'^ process 



are asymptotically independent. The limiting distribution of Xn{-) is, with 
il!{(f)o,t) as in i55]} . that of a mean-zero Gaussian martingale with variance 
function 

uj'^{t,(f)o) = / ilj{(j)o,u)Xo{u)du. 



In particular, the scaled difference between the estimated and true integrated 
hazard 

^/n (^A„(-,0„) - A(-)) 

converges weakly as n ^ oo to a mean zero Gaussian process with covariance 
function 

alis, t) = oj\s A t) + B{s, (l)o)cT^B{t, 0o). 
The function (t\{s, t) can be estimated uniformly consistently by (^^(s, t) where 

al{s, t) = uj^{s A t; 0„) + (j)„)alB„{t; 
uj'^it; (f)) = "n.^ 2 



B^{t,<P) = 



E^en. z,{t,)<p''('^^-'w.{t„n 



■J) 



and 6"^ is any consistent estimator of a'^ of such as (|73| ). 
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Proof: The form of A„ is the same as in BGL, and noting in particular that 
Condition 4 in BGL can be satisfied by letting Xr(t) = maxo<j<^aj and 
D{t) a constant, we have that is asymptotically equivalent to the local 

square integrable martingale, 

and the proof of the claims made of the asymptotic distribution of X„ now 
follow as there. 

Regarding the asymptotic independence, note that for any j < k, r C 71, 
and locally bounded predictable processes Hr, 



Jo r Jo 

[ V arCpo'A^^Hrd < Mi, Ml >s - [ Y ^Po'^rAiHrd < M^, M^ >, 

Jo J. Jo r 

r 

Hence, by the asymptotic joint normality provided by ReboUedo's Theorem 
II. 5.1 in Andersen at al. (1993), functions of the collections {j^HrdMr}r 
and {(pQ'^Wjk — (j>o^Wkj}j<k, in particular X„(-) and n~^/'^Un{(po) , are asymp- 
totically independent. But by (jlZj), n~^/^Z//„(0o) and a non-zero constant 
multiple of \/n{(j)n — 0o) are asymptotically equivalent. 

The claim that cr\{s, t) can be estimated uniformly consistently by cr\{s, t) 
follows as in BGL, based on the fact that a;^(t, 0o) is the optional variation 
process of the local square integrable martingale Yn{-)., which by ReboUedo's 
theorem as cited above, converges uniformly in probability to its predictable 
variation a;^(t,0o); the uniform convergence of i?„(-,0„) to i?(-,0o) is as in 
BGL, Proposition 2. ■ 



5 Examples 

We apply our results to the designs discussed in Section [H highlighting the 
classical case where ?7 = 1, ao = 0, and ai = 1, with the canonical choice of a 
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given in (fTTj) . Though our asymptotic results hold under the weaker stability 
conditions of Sections El and El here assume that the censoring, covariate 
and strata variables are i.i.d. copies of Y{t),Z{t) and C{t) respectively, 
left continuous and adapted processes having right hand limits. The strata 
variable needed for Designs 01 and 01 gives the 'type' of individual among the 
possible values in a (small) finite set C; the strata variable may be used to 
model any additional information, a surrogate of exposure in particular. 

For each of the Designs ^ through HI we verify that Conditions ^ through 
ini are satisfied, and determine the standardized asymptotic distributions of 
/3n and A„. We assume that r < oo, and so, since Aq is already assumed 
bounded away from infinity, the finite interval Condition ^ holds. As already 
noted, due to our choice of the (standard) function a as in (fTTj) . only the 
convergence of hn,vit) to h^it) for |v| G {2,3} is required in order to satisfy 
Condition 121 To satisfy Conditional letting 

fk{t) = P{Z{t) = ak\Y{t) = 1) for = 0, . . . , r/, 

for Designs ^ and |21 we assume that some j < k with cjk > there is a 
non-trivial interval of time [a,b] C [0, r] over which both fj{t) and fk{t) are 
bounded away from 0. In typical cases, one would have Cjk > for all pairs 
j < A; in order to take maximum advantage of the available information, and 
there would be a positive probability in some intervals of time that an at risk 
individual has covariate ak', in such a situation any pair j < k can be used 
to demonstrate the satisfaction of Conditional 
Let 

qiit) = P{C{t) = l\Yit) = 1), 

and 

h,i{t) = P{Z{t) = ak\Cit) = I, Y{t) = 1) k = 0,...,r], leC. 

For Design 121 to satisfy Condition |21 we assume that there exists a pair j < k 
with Cjk > and / G C with m; > 2 such that over some non-trivial interval 
[a,b] C [0,r] the functions qi(t), fj^i(t) and fk,i(t) are bounded away from 
zero. That is, that there is some strata in which a comparison of individuals 
can be made, and in that strata, the covariate value is not a constant. 

For DesignjH to satisfy ConditionjSlwe assume either i) the assumption for 
Design 121 holds, or ii) that there exists a pair j < k with cjk > and for some 
unequal pair li, I2 the functions qi^it), qi^it), fjj^it), fk,i2(t) ai'e bounded away 
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from zero. That is, we need to assume either that a meaningful comparison 
can be drawn i) within a strata or ii) between two different strata. Design |21 
is a special case of Designs El and El with C = {I}, mi > 2, and qi(t) = 1 and 
so i) recovers the assumption in Design |21 used to ensure Condition jSl 

As noted above, Condition 01 holds due to our choice of function a. Con- 
dition El is satisfied using that r < oo, and assuming that 

inf p{t) > 0, where p{t) = PiY{t) = 1); 
te[o,T] 

one needs only to invoke the strong law of large numbers in D[0, 1] of Rao 
(1963) (after reversing the time axis), similar to BGL. We show Condition 
El is satisfied in each of our examples below by proving convergence to, and 
identifying, the indicated limiting functions. In summary, in each of the ex- 
amples which follow, we need only verify Conditions jHEl andjHl Throughout 
we let 

n{t) = |7^(t)| and p„(t) = n{t)/n. 

Design 1 Full Cohort. In this situation all individuals who are at risk at 
the time of failure are sampled, giving 7rf(r|i) = l(r = Ti{t)). Recalling that 
rikit) = \7lk{t)\, the number of individuals in 7^(t) with covariate k at time 
t, we have 

A'r{t)= E 7r,(r|^) = r^,(t)l(r = 7^(^)). 

By M^) aTi{t){t) = n{t)~^ , and with Tj the collection of failure times of indi- 
viduals having exposure], 



RAr) = TE^rW^rW^AT^W = r^dNj^ (t) = E 



nit) 



in agreement with ^}). Using |F?1 ), for |v| = 2,3, 

h.At) = -E«r^'"'wn^rW=Pn(t)n^-pMt)n/^w=^v(t); 

rCTZ kev fcGv ^ ' kev 

hence Condition is satisfied. Using that Aq is bounded away from zero. 
Condition is satisfied by the pair j < k for which fj{t) and fk{t) are 
assumed bounded away from zero over some interval. 
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It remains only to verify Condition{^ By \5^) and \5'Jji . 

fc=0 fc=l 

so 

Hence we identify the limiting functions as 

thereby fulfilling Condition\^ 

In the classical case, we may write the numerator of ^9}) as 



(0o/i W + Moit)) p{t)fo{t)fi{t)Xo{t)dt = I {(Plf.it) + Moit)) ho,{t)Xo{t)dt 
and so 



a' 



j;mi{t)+Mo{t))hoi{t)\o{t)dt 



{j^hoi{t)Xoit)dty 

For the parameters in the asymptotic distribution of the estimate of the base- 
line hazard, we have in this case 



/o(t) + 0o/i(t) ^'"^'^ /o(t) + 0o/iW 

Specializing further to the null case (po = 1, we have fo(t) + (pofi(t) = 
foit) + flit) = 1; '^'iT'd hence 

= £p{t)fo{t)f,{t)\o{t)dt ^^^^ 

and 

e(0o,t) = and ^{(j)o,t) = I. 
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In the next three examples we require certain hmits of the multivariate 
hypergeometric distribution 

X ~ 7^^+1(11, m) 

having integer parameters 77 > 0,m > 0, and n = (no, . . . a vector of 
non-negative integers, whose j*'* component Xj counts the number of items 
of type j contained in a sample without replacement of size m taken from a 
set having nj items of type j. That is, for x = (xq, • • . , x^) with non-negative 
integer components and |x| = xq + • • • + x^, 

UU 0) 

P(X = x) = ^ , for |x| = m and |n| = n. (57) 

Km) 

Proposition 5.1 Let X have distribution (|57| ). Ifrij/n fj G [0, 1] for all 

j = 0, as n —> 00, then for all bounded continuous functions G, 

EGCX) EG{Y) as 00, (58) 
where the vector Y ~ M(f , m) has the multinomial distribution 

P(Y = x) = (^^ r for |x| =m and ^ = H ' 

whose j^^ component Yj counts the number of items of type j included when 
m items are sampled with replacement from a population where the fraction 
of type j items is fj . 

In particular, we have convergence of the moments 

EXj mfj, 
EX.Xk ^ (m)2/,/fc, EXf ^ mfj + {m^ff, 
EX.XkX, ^ {mhf.fkf,, EX^Xk ^ (m)2/,/fc + M^fffk, 

and 

EXf mf, + 3(m)2/; + (m^ff. 
If n/n — >p i, a (possibly random) vector of limiting frequencies, then 

E[G{X)\n] E[G{Y)\n] as n ^ 00, (59) 

and similarly for the convergence of the indicated moments. 
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Proof: The convergence in distribution, giving (|58p. of the hypergeomet- 
ric to the multinomial is well known, and may be shown, for example, by 
coupling the two distributions so they are equal except on the set of van- 
ishingly small probability where the sample with replacement includes some 
individual more than once. The convergence of the indicated moments of 
the hypergeometric to the corresponding moments of the multinomial now 
follows from the boundedness given by |X| = m. 

When n/n f, for every subsequence of n there exists a further subse- 
quence where n/n — > f almost surely, and the first part of the Lemma gives 
almost sure convergence of -E'[G'(X)|n] to £'[G(Y)|n] along this subsequence. 
Hence the full sequence converges in probability. ■ 

In what follows we suppress the conditioning in on n. 

Design 2 Simple Random Sampling. The sampling probabilities for this de- 
sign are given by 

-1 

l(r 3 i, |r| = m, r C Tl{t)), 

yielding that for r C 7^(t) with |r| = m we have vr((r) = ["'^) , and letting 
Vk{t) = {ier, Zi{t) = ak} and rk{t) = \rk{t)\, 

and by H^) 

1 fn{t) - 1\ 
art = - T • 

Hence 

rC7^ kev ^ ^ \r\=m,rCTZ{t) fegv 

|r|=m,rC7^(^) fcGv fcSv 

where Xk{t) is the k^^ component of the multivariate hypergeometric vector 
X(t) ~ 7i^_,_i(n(t), m) with n(t) = (?7,o(t), . . . , 



n{t) - l\ 




n{t) (n{t) 



nm 



|v| 
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Taking limits for j, k distinct, using Proposition \5.1l for |v| = 2 
while for |v| = 3, 



h,At) = ^ {imhf,{t)f,it) + {m)sff{t)fk{t)) 
and with j, k, q distinct, 



h,u,{t) = P^{m),f,{t)f,{t)f,{ty, 

hence Condition is satisfied. Condition is verified here as it was for 
DesignU\ 

We begin the verification of Condition\^ by determining the limiting value 
e{(j)o,t) of Using ([2^ and 



rClZ rClZ 



Writing this expression as an expectation with respect to the multivariate 
hypergeometric distribution and taking the limit using Proposition I5.il gives 

Similarly, ilj{(f)Q,t) of i55]} is the limit 



rCTZ 

m \ ^ / i \ / m 



Hence, Condition\^ is satisfied. 
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In the classical case ^y{ ) yields 

2 00 IoPit)mm [(1 + 0o) + (/o(t) + 0o/i(t))(m - 2)] \o{t)dt 
a = ^ , 60) 

(m-1) {£p{t)fo{t)mxomY 

and the formulas above specialize to 

Under the null (pQ = 1, in the numerator of i6(J\} we have 
(l + 0o) + (/oW+0o/iW)(m-2) = 2 + (/o(t) + /i(t))(m-2)=m, 
and therefore 



m 



m-1 J j;p{t)foit)f,{t)Xo{t)dt' 

giving an asymptotic relative efficiency of {m — l)/m with respect to the 
full cohort variance the same relative efficiency as the MPLE, as was 

expected by the argument supplied at the end of Section\^ Lastly, in the null 
case 

e(0o,i^) = flit) and ip{(t)o,t) =p{ty^. 

Previous efficiency work used a recursive representation of the factorial 
moments of the extended hypergeometric distribution Harkness (1965) to de- 
rive an asymptotic variance expression for "small strata" case-control data 
Breslow (1981), Hauck and Donner (1988). The expressions so derived are 
the special case of \6U\) when there is a single case per set, a simplification 
that has not been previously described. Figure Q] shows efficiency curves rel- 
ative to the maximum partial likelihood estimator (MPLE) as a function of 
log0 by m when /i(t) = .2. ^45 noted previously in Breslow (1981), the 
Mantel-Haenszel estimator has high efficiency relative to the MPLE over a 
fairly large region around the null. 

In the next two designs we consider, for r G TZ{t) let 

TkAt) = e r : Zi{t) = k, C,{t) = /}, r^/t) = |rfe/t)|, 
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and 

UkAt) = \nk{t)nCi{t)\, 

the number of individuals having covariate k and type I at time t. With 
nz(t) = {no At), nr,,i{t)), let 

Xi{t) ^ Hr,+i{ni{t),mi), leC (61) 

be independent multivariate hypergeometric vectors. 

Design 3 The sampling probabilities for the matching design are given by 
= h''^'^^^^ 'l(r C Cc^it){t),r3 |r| = m^.w), 

where C is a set of types, Qit) are all those of type I E C at risk at time t, of 
which there are Ci{t). 

For r C Ci{t) with |r| = mi, we have 

Since for such r we have '^k=o^k,i(t) = mi, summing over k yields 

ar{t) = ^h^^^)lircCiit),\r\=mi). 
ci[t) \mi J 

Hence, 

rCR. feGv 
l€C rCCi{t),\r\=mi fcGv 

/eC ^ ' rCCi{t),\r\=m.i kGv 

Then with 'X.iit) as in / fF?|) we can write 

l&C ' fcev 
31 



For V = {fci, ^2} distinct, taking limits using Proposition \5.1\ we find 

Kit) = Pit) J2 (^^^) qiit)fkMfk,M- (62) 
Zee V / 

For V = {/ci, /ci, ^2} with ki, ^2 distinct, we have 

Kit) = Pit)Y^qiit) ("^^hMfUt) + ^^^^^^fUt)fUt)) , 
anc? for v = {ki, k2, k^} all distinct. 

Kit) = Pit) J2 ^'^'~2^^' <iiit)hMfkUt)fkAt)- 

lec ' 

Hence Condition\^is satisfied. Condition\^is satisfied in a manner similarly 
as for Design{^ with the additional assumption that mi > 2, ensuring that 
imi — l)/mi in KU^) is positive. 

For the verification of Condition\^ we have 



nit) \ mi J 



it), |r| = nil), 



so for the limiting value ei(j)Q,t) of ^54\ ) we have 

^nit)\mij ^ n0n'^fciW 

2-^ r,ii\ 



nit) V ELo<l>o''XM 



Similarly, tpi(j)Q,t) of i5^) is the limit 



leC \xi\—mi 
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and Condition\^is satisfied. 

For the classical case, from \4f^ , 

, 00 r.pit) Eiec (^) qi{t)foAt)fiAtm + 0o) + {fo,i{t) + 0o/i,/(t))K - 2)]Xo{t)dt 

= ^ — 2 

(/o>wE/ec (^) qiit)foAt)fi,imom) 

and the formulas above specialize to 
and 

Specializing further, under the null (po = 1, 



{loPit) Eiec (^) mfoAt)fi,imom) ' 



A'Po,t) = Yqiit)fi,iit), and il){(t)Q,t) = pit) ^ . 
lec 

Remaining in the classical case, we more generally consider the matching 
framework where each strata I G C has its own baseline Xi{t), so that if 
Ci{t) & C is the strata of individual i at time t, the observed failure intensity 
for individual i is given by 

Kit)=Y,it)<l>^/'hc^^t)it). 

Even in this extended model, it remains true that 

/?10(t) -00^01 W 

is a local square integrable martingale. We can guarantee the consistency of 
(pn by letting Condition U\ hold with Xi{t) replacing Xo{t), and Condition 
hold with 

rCCi(t) fcev 
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and its scaled limit h^^i{t) replacing and H^{t) and h^{t) respectively, for each 
I E C. In addition, when Condition^ holds for each Ci{t) replacing TZ for 
each I G C, the asymptotic variance of (pn for the matching design with strata 
specific baseline hazard Xi(t) is given by 



a 



lo Y.iaci^lhmiA'^) + (pohioo,i{t))\i{t)dt 



Design 4 The sampling probabilities for the counter matching design are 
given by 



'Kt[r\i] 



n 

.lec 



mi 



-I -1 



r3i,reVc{t)), 



where C is a set of types, Vcif) C TZit) the collection of sets r with mi subjects 
of type I at time t, ci{t) is the number of type I subjects in TZ{t), and Ci{t) 
the type of subject i at time t; by 



n 



ci{t) 
mi 



l(r G Vc{t)). 



(63) 



Letting rk^i{t) = {i e r : Zi{t) = k,Ci{t) = I}, and Vk^iit) = \rk,i{t)\ we 
have 



n 

.l£C 



m 



-1 



< ' m 1 / 



an 



dforreVcit), by Hf, 



ar(t) 



n{t) 



n 



Clit) 

mi 



l(r G Vc{t)). 



Hence, 



van 
n 



kev 



n 

.leC 



Clit) 
mi 



E HE 

reVc(t) kev \ leC 



rk,i(t) cijt) 
mi n{t) 
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Now we can write the above sum as the expectation 

\fcev zee ' / \ip£C,p=i,...,\v\k&v 'p 

For the case |v| = 2 with v = {A;i,A;2} distinct, the expectation in |^[) 
expands into the diagonal and off diagonal sums, 

j,(y XkAt)XkAt)clit) ^ y Xk,,iAt)Xk,,i,it)ci,it)ci,it ) 
Vfic mfn'^{t) mi^mi^n^it) 

Letting 

^-.gzW and 
and applying Proposition \5. 1\ we find that hn,v{t) converges to p{t) times 

^ {mi)2fk,,i{t)fk.Mq^ii) _^sr^ f (+\f (A /.N 

lec ' h^h 



E ( ^^^) fk,,i{t)fUt)Qfit) + h^Mfk^MqiMqUt) (65) 

,2 



iit)fk2,iit)qtit). (66) 



.(64) 



Applying the assumptions made at the beginning of this section in version i) 
on the first sum in / f^) . or in version ii) on the second sum in KUB^) . we find 
Condition satisfied. 

For |v| = "i we consider the two cases v = {fci, ki, /C3} with ki 7^ k^ and 
V = {ki, k2, ks} , all distinct. In the first case, applying Proposition \5.1l for 
the diagonal term h = h = h in expression ^64\ ), 

^ I xljt)xut)cm \ y [im)2fk,At)fUt) + imhfUt)fUt)]qf{t) 
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for Zi = Z2 7^ h, 
for li = l-^j^ I2, 
for l2 = k^ k, 

-yfci.zi {t)Xk,,i, {t)Xk,,i, {t)ci^ {t)cl (t) \ ^ {mi,)2fk,,hit)fkui2{t)fk3,h{t)qh{t)ql{t) 
and for h,l2, h distinct, 

^1 y ^fci,h {t)Xk,,l, (t)Xfc3,;3 {t)ci, {t)ci, {t)ci, (t ) 

\ mi,mi,mun^(t) 

Yl fkuh{t)fki,i2it)fki,h{thi{t)qi2{t)qh{t)- 

\{h,l2,l3}\=3 

Summing and simplifying, we find that for v = {ki, ki, k^} with ki ^ k^, 
hy,(t) is pit) times 

fl{t)fk:At) (67) 

+ E -\fkuiit)fk3,iit) (^Kl - - (1 - qfit) 

lec ' 

+ E (—) /^lA W ifk^Mii - fk,,iAt)) - 2fk,Mfk,M] qUt)qi2it)- 

h^l2 ^ 

Similarly, for v = {ki,k2,ks} distinct, applying Proposition \5.1l we have 
for l^=l2 = 

p /^V- XkAt)Xk„iit)Xk„iit)cfit) \ ^ imi)sfk,,iit)fk„iit)fk„iit)qf{t) 
mfn^t) mf 
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for Zi = Z2 7^ h, 

^(y- ^fci.^i it)Xk2,h {t)Xk,,i, {t)cl (t)Q3 (t) \ ^ K Ja/fci./i {t)fks,h it)(ll {t)qh jt) 

for /i = 7^ /2, 

^ y- {t)Xk,,i, {t)Xk,,i, {t)cl {t)ci, jt) \ ^ K j2/fci,h (t)/fc2,;2 (^)/fc3 A (^)g<2 (i^) 
/or l2 = ls^ k, 



and for /i, /2, h distinct, 

^1 Xk^,h it)Xk2,i2 it)Xk;,i, (t)Q, (t)Q3 jt ) 

\|{il,/2,«3}|=3 ^ ' 

Yl /fcl,'l(^)/fc2,/2(^)/fc3,/3(^)?il(^)*2(^)%(^)- 
|{il,/2,/3}|=3 

Summing, we find that for v = {fci, ^2, ^3} distinct, hv{t) is p{t) times 

fkAt)fk2{t)fkM 

- (— ) /fci,/iW/fe,/iW/fc3,/3Wft' 



il7^/2 



Hence the remaining |v| = 3 portion of Condition\^is satisfied. 

For the parameters in the limiting distribution of the estimator of the 
cumulative baseline hazard, using \6'^) and applying Proposition \5 . 1\ for each 
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/ G C yields 



rcn rcn 

nil 



n 



^ ELi E;gc "fc'/'o" ^m(^)- 



reVc{t) Yl!k=0^l(^C^O^^k,l{t) 



A"fe-1„ gift) 



_ Eiec ELi «fc</'o' 



Similarly, %l){(f)Q,t) of \5^) is obtained by taking the limit 

1 „ / 1 



n 



J2^Kr){So{<Po,t)}-' = —-E 



rC7e ' ' \Z^iGC /l^fc=0 v^o ^"-fc/v; 

^ 5Z (^^ ^=^5 q,(t) ) n (x )^"(^)- 

Hence, Condition{^ is satisfied. 

Specializing to the classical case, the functions /loi(^) d'nd /loii(t) can be 
determined from ^UB) and (f^7| ) /or /^i = 1, /C3 = respectively, and after some 
simplification using l — fkj^.i^{t) = fk2,h{'t) to obtain the following slightly more 
agreeable form for the latter, we have 



hoi{t) = (^/oW/iW-|](^)/o,/W/i,/W?fwj <^nd (68) 

= pit) {foit) flit) 

- E(;i)/o,/W/i,/W(i - 2/1,/ W)?f W ) ; 



^100 (^) the same as /iiio(t) with the roles of and 1 reversed. The value 
of a"^ can now be calculated by HJ^) - 
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For the parameters in the limiting distribution for the baseline hazard 
estimator, we have 

and 

VFe specialize further to the case where there are two strata, \C\ = 2, 
and the binary strata variable C{t) e {0,1} is a (perhaps easily available) 
surrogate for the true binary exposure Z{t) G {0, 1}. Recalling 

fk,i{t) = P{Z{t) = k\C{t) = /, Y{t) = 1) A;, / G {0, 1}, 

we have 

fkAt)qiit) = P{z{t) = k\cit) = I, Y{t) = i)P{C{t) = i\Y{t) = 1) 

= P{Z{t) = k,C{t) = l\Y{t) = l) = TXkAt) say, 

and 

6{t) = P{C{t) = l\Z{t) = 1, Y{t) = 1) and 7(t) = P{C{t) = 0\Z{t) = 0, Y{t) = 1), 

the sensitivity and specificity of Z(t) forC{t). Since 

TTnW = 5{t)f\{t), TTioW = (1 - m)hit) 
7roi(t) = (1 - 7W)/o(t), vroo(t) = 7W/o(t), 

we can write the expression in / I6'<S|) in parenthesis for, say mo = mi = 1, as 

fo{t)fi{t) - {fo,i{t)fiAt)ql{t) + /o,o(t)/i,o(t)go'W) 
= fo{t)fi{t) - (7ro,i(t)7ri,i(t) + 7ro,o(t)7ri,o(t)) 

= /o(t)/i(t) - ((1 - lit))fomt)m + 7W/o(t)(l - 

= /o(t)/,(t)((l-<5(t))(l-7(t))+7(t)5(t)). (69) 

In a similar way /ioii(^) an-d /iooi(^) can 6e expressed in terms of the sen- 
sitivity, specificity, and probability of exposure integrated against the base- 
line hazard. Using ^6\ ) and the partial likelihood variance given in (A3) 
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from Langholz and Borgan (1995), asymptotic efficiencies for the Mantel- 
Haenszel relative to the partial likelihood can he computed. Figure shows 
the asymptotic relative efficiencies by log(0) with P{Z{t) = l\Y{t) = 1) = .2 
for mo, mi G {1,2} when the conditional distribution of {Z{t),C{t)) given 
Y(t) = 1 does not depend on t, which holds, approximately, for rare out- 
comes when censoring does not depend on {Z{t),C{t)). Although there is 
some difference in the relative efficiencies by choice of mo and mi and the 
sensitivity and specificity ofC for Z , the Mantel-Haenszel estimator has fairly 
high efficiency in a wide range of situations. 

Under the null (po = 1 in the classical case, the numerator of the variance 
formula simplifies since 

hoii{t) + hiio{t) = hoi{t), 

yielding 

a' = ^ ^ . (70) 

loPit) (/o(t)/iW - Ei{^,)foAt)hAt)qm) Ao(t) 

Under the null in general, using that Yl^k=o = o-nd EX^^i = mifk^i{t), 
we have 

V 

e(0o,t) = J^afc/fc,z(t)gi(t) and iJj{(f)o,t) = p{t)~^, 

k=l l&C 

SO in the classical case in particular 

e{^o,t) = J2MtMt)- 

When {mo, mi) = (1,1), so that the design matches one control with 
'surrogate exposure' C{t) value opposite to the exposure Z{t) of the case, 
substituting KU^) into yields 

= p{t)fo{t)fimi-6mi~^{t))+^mt))Xo{t)dt 

which is the equal to the asymptotic variance for the (1,1) counter matching 
design when using the maximum partial likelihood estimator Langholz and 
Clayton (1994), ^ result expected based on the argument at the end of Section 
We note that, as in Langholz and Clayton (1994), when the sensitivity and 
specificity are close to 1 (or zero ), the counter matching design has efficiency 
close to that of the full cohort. 
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Figure 1: Asymptotic efficiency by exposure rate ratio of Mantel-Haenszel 
relative to the partial likelihood estimator for simple random sampling of 
m — 1 controls. Probability of exposure P{Z{t) — l\Y{t) = 1) = -2. 




Figure 2: Asymptotic efficiency by exposure rate ratio of Mantel-Hacnszcl 
relative to the partial likelihood estimator for counter- matching by sensitivity 
{6 = P{Z{t) = l\C{t) = l,Y{t) = 1)) and specificity (7 = P{Z{t) = 
0\C{t) = Q,Y{t) = 1)). Probability of exposure P{Z{t) = l\Y{t) = 1) = .2. 
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